A word-grammar based morphological analyzer for agglutinative languages

نویسندگان

  • Itziar Aduriz
  • Eneko Agirre
  • Izaskun Aldezabal
  • Iñaki Alegria
  • Xabier Arregi
  • Jose Maria Arriola
  • Xabier Artola
  • Koldo Gojenola
  • Montse Maritxalar
  • Kepa Sarasola
  • Miriam Urkia
چکیده

Agglutinative languages presenl rich morphology and for sonic applications they lleed deep analysis at word level. Tile work here presenled proposes a model for designing a full nlorphological analyzer. The model integrates lhe two-level fornlalisnl alld a ullificalion-I)asod fornialisni. In contrast to other works, we propose to separate the treatment of sequential and non-sequetTtial mou)holactic constraints. Sequential constraints are applied in lhe seglllenlalion phase, and non-seqtlontial OlleS ill the filial feature-combination phase. Early application of sequential nlorpholactic coilsli'aiills during tile segnloillaiioi/ process nlakes feasible :,ill officienl iinplenleilialion of tile full morphological analyzer. The result of lhis research has been tile design and imi)len~entation of a full nlorphosynlactic analysis procedure for each word in unrestricted Basque texts.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Computational Morphology and Natural Language Parsing for Indian Languages: A Literature Survey

Computational Morphology and Natural Language Parsing are the two important as well as essential tasks required for a number of natural language processing application including machine translation. Developing well fledged morphological analyzer and generator (MAG) tools or natural language parsers for highly agglutinative languages is a challenging task. The function of morphological analyzer ...

متن کامل

An Unsupervised Morpheme-Based HMM for Hebrew Morphological Disambiguation

Morphological disambiguation is the process of assigning one set of morphological features to each individual word in a text. When the word is ambiguous (there are several possible analyses for the word), a disambiguation procedure based on the word context must be applied. This paper deals with morphological disambiguation of the Hebrew language, which combines morphemes into a word in both ag...

متن کامل

A Paradigm-Based Finite State Morphological Analyzer for Marathi

A morphological analyzer forms the foundation for many NLP applications of Indian Languages. In this paper, we propose and evaluate the morphological analyzer for Marathi, an inflectional language. The morphological analyzer exploits the efficiency and flexibility offered by finite state machines in modeling the morphotactics while using the well devised system of paradigms to handle the stem a...

متن کامل

Combining Stochastic and Rule-Based Methods for Disambiguation in Agglutinative Languages

In this paper we present the results of the combination of stochastic and rule-based disambiguation methods applied to Basque languagel. The methods we have used in disambiguation are Constraint Grammar formalism and an HMM based tagger developed within the MULTEXT project. As Basque is an agglutinative language, a morphological analyser is needed to attach all possible readings to each word. T...

متن کامل

A Rule-Based Morphological Disambiguator for Turkish

Part-of-speech (POS) tagging is the process of assigning each word of an input text into an appropriate morphological class. Automatic recognition of parts-of-speech is very important for high level NLP applications, since it would be usually infeasible to perform this task manually in practical systems. One approach to POS tagging uses morphological disambiguation which selects the most suitab...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2000